add benchmarks using pytest-benchmark and codspeed#3562
add benchmarks using pytest-benchmark and codspeed#3562d-v-b merged 43 commits intozarr-developers:mainfrom
Conversation
|
@zarr-developers/steering-council I don't have permission to register this repo with codspeed. I submitted a request to register it, could someone approve it? |
done |
|
does anyone have opinions about benchmarks? feel free to suggest something concrete. Otherwise, I think we should take this as-is and deal with later benchmarks (like partial shard read / writes) in a subsequent pr |
.github/workflows/codspeed.yml
Outdated
| uses: CodSpeedHQ/action@v4 | ||
| with: | ||
| mode: instrumentation | ||
| run: hatch run test.py3.11-1.26-minimal:run-benchmark |
There was a problem hiding this comment.
can we test the latest instead? seems more appropriate...
There was a problem hiding this comment.
The latest version of python? What's the reasoning? I'd rather update this file when we drop a supported version vs when a new version of python comes out.
There was a problem hiding this comment.
Because we'd want to catch a perf regression from upstream changes too? I'm suggested latest version of released libraries py=3.13, np=2.2
There was a problem hiding this comment.
we don't have an upper bound on numpy versions, so I don't think this particular workflow will help us catch regressions from upstream changes -- we would need to update this workflow every time a new version of numpy is released. IMO that's something we should do in a separate benchmark workflow. This workflow here will run on every PR, and in that case the oldest version of numpy we support seems better.
we also don't have to use a pre-baked hatch environment here, we could define a dependency set specific to benchmarking. but my feeling is that benchmarking against older versions of stuff gives us a better measure of what users will actually experience.
indexing please. that'll exercise the codec pipeline too. a peakmem metric would be good to track also, if possible. |
I don't think codspeed or pytest-benchmark do memory profiling. we would need https://pytest-memray.readthedocs.io/en/latest/ or something equivalent for that. and an indexing benchmark sounds like a great idea but I don't think I have the bandwidth for it in this pr right now |
…into chore/benchmarks
…into chore/benchmarks
|
(Enabled the app) |
maxrjones
left a comment
There was a problem hiding this comment.
IMO it'd be better to skip the tests/benchmarks during regular test runs in the interest of speed
Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com>
i think this makes sense -- on my workstation the current benchmark suite takes 40s to run as regular tests, which is a big addition to our total test runtime. The latest changes to this branch skip the |
|
I'm seeing some flakiness in the codspeed CI run: https://github.com/zarr-developers/zarr-python/actions/runs/21025818794/job/60449955113#step:5:253. Since this is a codspeed-specific issue (from what I can tell), and it doesn't detract from the utility of the benchmark suite for local benchmarking, I'm going to work around this failure by ignoring this warning. |
since #3554 was an unpopular direction I'm going instead with codspeed + pytest-benchmark. Opening as a draft because I haven't looked into how codspeed works at all, but I'd like people to weigh in on whether these initial benchmarks make sense. Naturally we can add more specific ones later, but I figured just some bulk array read / write workloads would be a good start.